The \((X'X)^{-1}\) for the \(y=β_0+β_1 x_1+β_2 x_2+β_3 x_3+β_4 x_4+β_5 x_5+β_6 x_6+ε\) is given below.
If MSE = 1.395 and n = 38 , compute the (Keep 4 or more decimal places, DO NOT round in the intermediate steps)
se(β ̂_4)
\[se(\mathbf{\hat\beta_4})=\sqrt{MSE\times C_{55}}=\sqrt{1.395\times0.069}=0.3102499\]
\[Cov(\mathbf{\hat\beta_2,\hat\beta_4})=MSE\times C_{35}=1.395\times(-0.035)=-0.048825\]
\[se(\mathbf{\hat\beta_2})=\sqrt{MSE\times C_{33}}=\sqrt{1.395\times0.067}=0.3057205\]
\[Cor(\mathbf{\hat\beta_2,\hat\beta_4})=\frac{Cov(\mathbf{\hat\beta_2,\hat\beta_4})}{se(\mathbf{\hat\beta_2})se(\mathbf{\hat\beta_4})}=\frac{-0.048825}{0.3057205\times0.3102499}=-0.5147615\]
\(C_{66}=0.058\) has the smallest value. \(\hatβ_5\) has the the least variance and the most consistent among the estimators.
According to the \((X'X)^{(-1)}\),
\(C_{13},\ C_{17},\ C_{24},\ C_{25},\ C_{67}\) are positive.
The positively correlated pairs of parameters are
\(\hatβ_0\) and \(\hatβ_2\), \(\hatβ_0\) and \(\hatβ_6\), \(\hatβ_1\) and \(\hatβ_3\), \(\hatβ_1\) and \(\hatβ_4\), \(\hatβ_5\) and \(\hatβ_6\).
Consider the following hypothesis: \(H_0: β_1=2β_3,β_2=β_3,β_5=0\)
Report the T matrix, β vector and c vector along with their dimensions, and the rank of T matrix for testing the above hypothesis.
\[ \mathbf{T}=\begin{bmatrix} 0 & 1 & 0 & -2 & 0 & 0& 0 \\ 0 & 0 & 1 & -1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{bmatrix}_{3\times7} \mathbf{β}=\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \\ \beta_4 \\ \beta_5 \\ \beta_6 \end{bmatrix}_{7\times1} \mathbf{C}=\begin{bmatrix} 0 \\ 0 \\ 0\end{bmatrix}_{3\times1} rank(T)=3 \]
In this hypothesis,\(y=β_0+2β_3x_1+β_3x_2+β_3x_3+β_4x_4+0x_5+β_6x_6+ε=β_0+β_3(2x_1+x_2+x_3)+β_4x_4+β_6x_6+ε\)
The value of numerator is \(r=df_{Reduced}-df_{Full}=n-(3+1)-[n-(6+1)]=3\)
The denominator degrees of freedom is \(df_{Full}=n-(k+1)=38-(6+1)=31\)
\[SSR=\sum_{i=1}^n(\hat y_i-\bar y)^2=\sum_{i=1}^n(\hat y_i^2-2\hat y_i\bar y+\bar y^2)=\sum_{i=1}^n\hat y_i^2-2\bar y\sum_{i=1}^n\hat y_i+\sum_{i=1}^n\bar y^2\]
\[=\sum_{i=1}^n\hat y_i^2-2\bar yn\frac{\sum_{i=1}^n\hat y_i}n+n\bar y^2=\sum_{i=1}^n\hat y_i^2-2\bar yn\bar y+n\bar y^2=\sum_{i=1}^n\hat y_i^2-n\bar y^2\]
The data in the WaterFlow file are simulated data on peak rate of flow (in cfs) of water from six watersheds following storm episodes. The predictors are:
x1 : Area of watershed (mi2) x2 : Area impervious to water (mi2)
x3 : Average slope of watershed (percent)
x4 : Longest stream flow in watershed (1000s of feet)
x5 : surface absorbency index, (0= complete absorbency, 100=no absorbency)
x6 : estimated soil storage capacity (inches of water)
x7 : Infiltration rate of water into soil (inches/hour)
x8 : Rainfall (inches)
x9 : Time period during which rainfall exceeded ¼ inch/hour
X2,X7,X1,X4 have medium to strong positive linear relationship to the response variable (more than 0.6). X5 have medium negative linear relationship to the response variable.
The scatterplots of y versus x1, x3, and x4 show a medium to strong positive linear relationship and it is confirmed by the correlation coefficients (more than 0.6). Predictor x? has strongest negative linear relationship with y.
Fit the full model.
Explain whether the overall model is significant at 5% significance level.
The fitted model is statistically significant at 5% significance level (p value =0.0000 )
Includes plots to examine residuals to validate OLS assumptions
There is no violation of assumptions about the errors (no pattern on residual plots and points follow approximately straight line on the qq plot).
Differnt variable selection procedures such as all possible regression, best subset regression, stepwise regression, stepwise forward regression and stepwise backward regression
Tests for heteroskedasticity include bartlett test, breusch pagan test, score test and f test
Use different plots to detect and identify influential observations
The partial regression plots do not show nonlinear patterns and hence first-order terms are good enough.
VIF, Tolerance and condition indices to detect collinearity and plots for assessing mode fit and contributions of variables
The model does have serious problems of multicollinearity (VIF of >10).
X4, X1, X3, X7, X5
Predictor X1 is the number of rooms while X4 is the number of bedrooms in a house. A high correlation is expected between these two variables.
There is a problem of multicollinearity.
It will be important to solve multicollinearity first, and hence predictor x6 is the first to remove. However, according to the variable names of x1 and x4, predictor x6 may contain same information contains in x7. The bedrooms are rooms in a house.
Further, according to the correlation coefficients, x7 is less correlated with y than x6 is correlated.
It is better to remove x7 first and check whether the multicollinearity is solved. If it was not solved, then definitely, x6 has to be removed first.
Intercept of 10.04 suggests the average sale price (in 1000s dollars) of a house with zero taxes and zero baths. This does not make sense because there cannot be a house without a bathroom.
Coefficient of 5.595 suggests the average sale price of houses increases by $ 5.595 when the tax increases by 1000 and number of baths is a constant.
Coefficient of 1.935 suggests the average sale price of houses increases by $ 1.935 when the number of baths increases by 1 and tax is a constant.
The fitted model is statistically significant at 5% significance level (p value = )
Predictor X1 is the number of rooms while X4 is the number of bedrooms in a house. A high correlation is expected between these two variables.
There is a problem of multicollinearity.
If
Includes plots to examine residuals to validate OLS assumptions
There is no violation of assumptions about the errors (no pattern on residual plots and points follow approximately straight line on the qq plot).
Differnt variable selection procedures such as all possible regression, best subset regression, stepwise regression, stepwise forward regression and stepwise backward regression
Tests for heteroskedasticity include bartlett test, breusch pagan test, score test and f test
Use different plots to detect and identify influential observations
VIF, Tolerance and condition indices to detect collinearity and plots for assessing mode fit and contributions of variables
The general approaches for dealing with multicollinearity include collecting additional data, model respecification (redefine the regressors, variable elimination), estimation methods (Ridge Regression, Principal-Component Regression)
“Variable elimination is often a highly effective technique. However, it may not provide a satisfactory solution if the regressors dropped from the model have significant explanatory power relative to the response y. That is, eliminating regressors to reduce multicollinearity may damage the predictive power of the model.” (p.304)
Stepwise Forward Regression based on p values (use α=0.15)
Stepwise AIC Forwardd Regression
Full model
eliminated model
Stepwise Backward Regression based on p values (use α=0.05)
Stepwise AIC Backward Regression
Full model
eliminated model
Full model
eliminated model
Both models do not have a problem of multicollinearity (VIF <10), and violation of assumptions about the errors (no pattern on residual plots and points follow approximately straight line on the qq plot).
The model with 4 predictors has a slightly higher (about by 2%) adjusted R square compared to the model with only x1 and x2. Further, x5 and x7 predictors are not statistically significant at 10% significance level (p values are 0.11479 and 0.10356, respectively). There is no significant pattern on the plot of studentized residuals versus predicted values from the model with only x1 and x2. The partial regression plots do not show nonlinear patterns and hence first-order terms are good enough.
Finally, the model with 2 predictors is simpler that model with 4 predictors. Therefore, the best model will be
Includes plots to examine residuals to validate OLS assumptions
There is no violation of assumptions about the errors (no pattern on residual plots and points follow approximately straight line on the qq plot).
Residual QQ Plot Residual Normality Test Residual vs Fitted Values Plot Residual Histogram
Differnt variable selection procedures such as all possible regression, best subset regression, stepwise regression, stepwise forward regression and stepwise backward regression
Tests for heteroskedasticity include bartlett test, breusch pagan test, score test and f test
Bartlett Test Breusch Pagan Test Score Test F Test
Use different plots to detect and identify influential observations
Cook’s D Bar Plot Cook’s D Chart DFBETAs Panel DFFITs Plot Studentized Residual Plot Standardized Residual Chart Studentized Residuals vs Leverage Plot Deleted Studentized Residual vs Fitted Values Plot Hadi Plot Potential Residual Plot
About 72.93% of variation in predicting the sale price of houses in Erie, Pennsylvania.
# build the model
model_wf_full <- lm(y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data=table_wf)
ols_regress(model_wf_full)
## Model Summary
## ------------------------------------------------------------------
## R 0.906 RMSE 609.308
## R-Squared 0.821 Coef. Var 47.188
## Adj. R-Squared 0.741 MSE 371256.369
## Pred R-Squared 0.618 MAE 366.548
## ------------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## ------------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ------------------------------------------------------------------------
## Regression 34143007.990 9 3793667.554 10.218 0.0000
## Residual 7425127.376 20 371256.369
## Total 41568135.367 29
## ------------------------------------------------------------------------
##
## Parameter Estimates
## -------------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## -------------------------------------------------------------------------------------------------
## (Intercept) 292.561 4428.618 0.066 0.948 -8945.373 9530.495
## X1 -203.144 410.268 -0.472 -0.495 0.626 -1058.947 652.660
## X2 1055.782 9833.700 0.028 0.107 0.916 -19456.957 21568.521
## X3 -49.240 156.200 -0.167 -0.315 0.756 -375.067 276.588
## X4 209.762 162.046 1.258 1.294 0.210 -128.259 547.783
## X5 -10.197 51.088 -0.059 -0.200 0.844 -116.764 96.370
## X6 -24.558 303.529 -0.012 -0.081 0.936 -657.709 608.592
## X7 142.778 3288.443 0.019 0.043 0.966 -6716.793 7002.349
## X8 511.713 209.741 0.541 2.440 0.024 74.200 949.226
## X9 -301.872 171.996 -0.398 -1.755 0.095 -660.649 56.905
## -------------------------------------------------------------------------------------------------
model_wf_full%>% summary()
##
## Call:
## lm(formula = y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9,
## data = table_wf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1404.21 -318.77 74.73 266.66 1274.30
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 292.56 4428.62 0.066 0.9480
## X1 -203.14 410.27 -0.495 0.6259
## X2 1055.78 9833.70 0.107 0.9156
## X3 -49.24 156.20 -0.315 0.7558
## X4 209.76 162.05 1.294 0.2103
## X5 -10.20 51.09 -0.200 0.8438
## X6 -24.56 303.53 -0.081 0.9363
## X7 142.78 3288.44 0.043 0.9658
## X8 511.71 209.74 2.440 0.0241 *
## X9 -301.87 172.00 -1.755 0.0945 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 609.3 on 20 degrees of freedom
## Multiple R-squared: 0.8214, Adjusted R-squared: 0.741
## F-statistic: 10.22 on 9 and 20 DF, p-value: 9.744e-06
Anova(model_wf_full)
## Anova Table (Type II tests)
##
## Response: y
## Sum Sq Df F value Pr(>F)
## X1 91022 1 0.2452 0.62589
## X2 4279 1 0.0115 0.91557
## X3 36893 1 0.0994 0.75585
## X4 622091 1 1.6756 0.21025
## X5 14790 1 0.0398 0.84381
## X6 2430 1 0.0065 0.93632
## X7 700 1 0.0019 0.96580
## X8 2209825 1 5.9523 0.02414 *
## X9 1143622 1 3.0804 0.09455 .
## Residuals 7425127 20
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Model Fit Assessment
ols_plot_diagnostics(model_wf_full)
# Part & Partial Correlations
ols_test_correlation(model_wf_full) # Correlation between observed residuals and expected residuals under normality.
## [1] 0.9710713
# Residual Normality Test
ols_test_normality(model_wf_full) # Test for detecting violation of normality assumption. #If p-value is bigger, then no problem of non-normality #
## -----------------------------------------------
## Test Statistic pvalue
## -----------------------------------------------
## Shapiro-Wilk 0.9589 0.2898
## Kolmogorov-Smirnov 0.1423 0.5314
## Cramer-von Mises 2.5333 0.0000
## Anderson-Darling 0.5169 0.1748
## -----------------------------------------------
#Lack of Fit F Test
ols_pure_error_anova(lm(y~X8, data = table_wf))
## Lack of Fit F Test
## ---------------
## Response : y
## Predictor: X8
##
## Analysis of Variance Table
## -------------------------------------------------------------------------
## DF Sum Sq Mean Sq F Value Pr(>F)
## -------------------------------------------------------------------------
## X8 1 4616882.92 4616882.92 5.795558 0.02290414
## Residual 28 36951252.44 1319687.59
## Lack of fit 21 31374881.28 1494041.97 1.875466 0.2003839
## Pure Error 7 5576371.17 796624.45
## -------------------------------------------------------------------------
# Variable Contributions
ols_plot_added_variable(model_wf_full)
# Residual Plus Component Plot
ols_plot_comp_plus_resid(model_wf_full)
alias(lm(y ~ as.factor(X3) + as.factor(X4) + as.factor(X5) + as.factor(X6) + as.factor(X7), data=table_wf))
## Model :
## y ~ as.factor(X3) + as.factor(X4) + as.factor(X5) + as.factor(X6) +
## as.factor(X7)
##
## Complete :
## (Intercept) as.factor(X3)6 as.factor(X3)6.5 as.factor(X3)7 as.factor(X3)15 as.factor(X4)2 as.factor(X5)60 as.factor(X5)65 as.factor(X5)70 as.factor(X6)1
## as.factor(X4)10 0 0 0 0 1 0 0 0 0 0
## as.factor(X4)15 0 1 0 1 0 0 0 0 0 0
## as.factor(X4)19 0 0 1 0 0 -1 0 0 0 0
## as.factor(X5)62 0 1 0 0 0 0 0 0 0 0
## as.factor(X5)67 0 0 0 1 0 0 0 0 0 0
## as.factor(X5)68 0 0 0 0 1 1 -1 -1 0 0
## as.factor(X5)80 1 -1 -1 -1 -1 0 0 0 -1 0
## as.factor(X6)1.5 0 1 0 0 0 0 0 0 1 0
## as.factor(X6)2 1 -1 0 -1 -1 -1 1 1 -1 -1
## as.factor(X7)0.2 0 0 0 0 1 0 0 0 0 0
## as.factor(X7)0.25 1 -1 -1 -1 -1 0 0 0 0 0
## as.factor(X7)0.35 0 0 0 0 -1 0 1 1 0 0
## as.factor(X7)0.5 0 0 1 1 0 -1 0 0 0 0
## as.factor(X7)0.6 0 1 0 0 0 0 0 0 0 0
Stepwise Forward Regression for full model
# Stepwise Forward Regression based on p values (use α=0.15) #
ols_step_forward_p(model_wf_full_log, penter = 0.15)
## Forward Selection Method
## ---------------------------
##
## Candidate Terms:
##
## 1. X1
## 2. X2
## 3. X3
## 4. X4
## 5. X5
## 6. X6
## 7. X7
## 8. X8
## 9. X9
##
## We are selecting variables based on p value...
##
## Variables Entered:
##
## - X4
## - X3
## - X7
##
## No more variables to be added.
##
## Final Model Output
## ------------------
##
## Model Summary
## -------------------------------------------------------------
## R 0.944 RMSE 0.549
## R-Squared 0.890 Coef. Var 8.618
## Adj. R-Squared 0.878 MSE 0.301
## Pred R-Squared 0.854 MAE 0.414
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -------------------------------------------------------------------
## Regression 63.565 3 21.188 70.378 0.0000
## Residual 7.828 26 0.301
## Total 71.393 29
## -------------------------------------------------------------------
##
## Parameter Estimates
## -------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## -------------------------------------------------------------------------------------
## (Intercept) 2.872 0.547 5.254 0.000 1.748 3.995
## X4 0.122 0.033 0.559 3.730 0.001 0.055 0.189
## X3 0.168 0.040 0.435 4.165 0.000 0.085 0.251
## X7 3.106 1.537 0.309 2.021 0.054 -0.053 6.266
## -------------------------------------------------------------------------------------
##
## Selection Summary
## ------------------------------------------------------------------------
## Variable Adj.
## Step Entered R-Square R-Square C(p) AIC RMSE
## ------------------------------------------------------------------------
## 1 X4 0.8030 0.7960 48.8552 68.4060 0.7087
## 2 X3 0.8731 0.8637 24.2129 57.2082 0.5792
## 3 X7 0.8904 0.8777 19.6668 54.8305 0.5487
## ------------------------------------------------------------------------
# Stepwise AIC Forward Regression #
ols_step_forward_aic(model_wf_full_log)
## Forward Selection Method
## ------------------------
##
## Candidate Terms:
##
## 1 . X1
## 2 . X2
## 3 . X3
## 4 . X4
## 5 . X5
## 6 . X6
## 7 . X7
## 8 . X8
## 9 . X9
##
##
## Variables Entered:
##
## - X4
## - X3
## - X7
## - X8
## - X9
## - X6
##
## No more variables to be added.
##
## Selection Summary
## ---------------------------------------------------------------
## Variable AIC Sum Sq RSS R-Sq Adj. R-Sq
## ---------------------------------------------------------------
## X4 68.406 57.330 14.063 0.80302 0.79599
## X3 57.208 62.335 9.057 0.87313 0.86373
## X7 54.830 63.565 7.828 0.89036 0.87771
## X8 54.522 64.144 7.248 0.89848 0.88223
## X9 44.504 66.537 4.856 0.93199 0.91782
## X6 39.161 67.591 3.801 0.94675 0.93286
## ---------------------------------------------------------------
Stepwise Forward Regression for X4 eliminated model
# Stepwise Forward Regression based on p values (use α=0.15) #
ols_step_forward_p(model_wf_rm4_log, penter = 0.15)
## Forward Selection Method
## ---------------------------
##
## Candidate Terms:
##
## 1. X1
## 2. X2
## 3. X3
## 4. X5
## 5. X6
## 6. X7
## 7. X8
## 8. X9
##
## We are selecting variables based on p value...
##
## Variables Entered:
##
## - X1
## - X3
## - X7
## - X6
## - X8
## - X9
##
## No more variables to be added.
##
## Final Model Output
## ------------------
##
## Model Summary
## -------------------------------------------------------------
## R 0.971 RMSE 0.421
## R-Squared 0.943 Coef. Var 6.618
## Adj. R-Squared 0.928 MSE 0.178
## Pred R-Squared 0.900 MAE 0.292
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -------------------------------------------------------------------
## Regression 67.310 6 11.218 63.195 0.0000
## Residual 4.083 23 0.178
## Total 71.393 29
## -------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) 2.307 0.410 5.623 0.000 1.458 3.156
## X1 0.207 0.053 0.368 3.897 0.001 0.097 0.317
## X3 0.263 0.022 0.680 11.944 0.000 0.217 0.308
## X7 5.453 1.002 0.542 5.442 0.000 3.380 7.525
## X6 -0.532 0.144 -0.192 -3.688 0.001 -0.831 -0.234
## X8 0.613 0.137 0.495 4.462 0.000 0.329 0.897
## X9 -0.433 0.112 -0.435 -3.864 0.001 -0.665 -0.201
## ----------------------------------------------------------------------------------------
##
## Selection Summary
## -------------------------------------------------------------------------
## Variable Adj.
## Step Entered R-Square R-Square C(p) AIC RMSE
## -------------------------------------------------------------------------
## 1 X1 0.5266 0.5097 154.8516 94.7131 1.0987
## 2 X3 0.8121 0.7981 47.7988 68.9988 0.7050
## 3 X7 0.8718 0.8570 26.9889 59.5306 0.5934
## 4 X6 0.8932 0.8761 20.8073 56.0486 0.5523
## 5 X8 0.9057 0.8860 18.0270 54.3108 0.5297
## 6 X9 0.9428 0.9279 5.8470 41.3046 0.4213
## -------------------------------------------------------------------------
# Stepwise AIC Forward Regression #
ols_step_forward_aic(model_wf_rm4_log)
## Forward Selection Method
## ------------------------
##
## Candidate Terms:
##
## 1 . X1
## 2 . X2
## 3 . X3
## 4 . X5
## 5 . X6
## 6 . X7
## 7 . X8
## 8 . X9
##
##
## Variables Entered:
##
## - X1
## - X3
## - X7
## - X6
## - X8
## - X9
##
## No more variables to be added.
##
## Selection Summary
## ---------------------------------------------------------------
## Variable AIC Sum Sq RSS R-Sq Adj. R-Sq
## ---------------------------------------------------------------
## X1 94.713 37.594 33.799 0.52658 0.50967
## X3 68.999 57.974 13.418 0.81205 0.79813
## X7 59.531 62.237 9.155 0.87176 0.85696
## X6 56.049 63.766 7.626 0.89318 0.87609
## X8 54.311 64.660 6.733 0.90569 0.88604
## X9 41.305 67.310 4.083 0.94281 0.92789
## ---------------------------------------------------------------
Stepwise Forward Regression for X1 eliminated model
# Stepwise Forward Regression based on p values (use α=0.15) #
ols_step_forward_p(model_wf_rm1_log, penter = 0.15)
## Forward Selection Method
## ---------------------------
##
## Candidate Terms:
##
## 1. X2
## 2. X3
## 3. X4
## 4. X5
## 5. X6
## 6. X7
## 7. X8
## 8. X9
##
## We are selecting variables based on p value...
##
## Variables Entered:
##
## - X4
## - X3
## - X7
##
## No more variables to be added.
##
## Final Model Output
## ------------------
##
## Model Summary
## -------------------------------------------------------------
## R 0.944 RMSE 0.549
## R-Squared 0.890 Coef. Var 8.618
## Adj. R-Squared 0.878 MSE 0.301
## Pred R-Squared 0.854 MAE 0.414
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -------------------------------------------------------------------
## Regression 63.565 3 21.188 70.378 0.0000
## Residual 7.828 26 0.301
## Total 71.393 29
## -------------------------------------------------------------------
##
## Parameter Estimates
## -------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## -------------------------------------------------------------------------------------
## (Intercept) 2.872 0.547 5.254 0.000 1.748 3.995
## X4 0.122 0.033 0.559 3.730 0.001 0.055 0.189
## X3 0.168 0.040 0.435 4.165 0.000 0.085 0.251
## X7 3.106 1.537 0.309 2.021 0.054 -0.053 6.266
## -------------------------------------------------------------------------------------
##
## Selection Summary
## ------------------------------------------------------------------------
## Variable Adj.
## Step Entered R-Square R-Square C(p) AIC RMSE
## ------------------------------------------------------------------------
## 1 X4 0.8030 0.7960 52.5895 68.4060 0.7087
## 2 X3 0.8731 0.8637 26.6181 57.2082 0.5792
## 3 X7 0.8904 0.8777 21.7454 54.8305 0.5487
## ------------------------------------------------------------------------
# Stepwise AIC Forward Regression #
ols_step_forward_aic(model_wf_rm1_log)
## Forward Selection Method
## ------------------------
##
## Candidate Terms:
##
## 1 . X2
## 2 . X3
## 3 . X4
## 4 . X5
## 5 . X6
## 6 . X7
## 7 . X8
## 8 . X9
##
##
## Variables Entered:
##
## - X4
## - X3
## - X7
## - X8
## - X9
## - X6
##
## No more variables to be added.
##
## Selection Summary
## ---------------------------------------------------------------
## Variable AIC Sum Sq RSS R-Sq Adj. R-Sq
## ---------------------------------------------------------------
## X4 68.406 57.330 14.063 0.80302 0.79599
## X3 57.208 62.335 9.057 0.87313 0.86373
## X7 54.830 63.565 7.828 0.89036 0.87771
## X8 54.522 64.144 7.248 0.89848 0.88223
## X9 44.504 66.537 4.856 0.93199 0.91782
## X6 39.161 67.591 3.801 0.94675 0.93286
## ---------------------------------------------------------------
Stepwise Backward Regression for full model
# Stepwise Backward Regression based on p values (use α=0.05) #
ols_step_backward_p(model_wf_full_log, penter = 0.05)
## Backward Elimination Method
## ---------------------------
##
## Candidate Terms:
##
## 1 . X1
## 2 . X2
## 3 . X3
## 4 . X4
## 5 . X5
## 6 . X6
## 7 . X7
## 8 . X8
## 9 . X9
##
## We are eliminating variables based on p value...
##
## Variables Removed:
##
## - X1
## - X2
## - X5
##
## No more variables satisfy the condition of p value = 0.3
##
##
## Final Model Output
## ------------------
##
## Model Summary
## -------------------------------------------------------------
## R 0.973 RMSE 0.407
## R-Squared 0.947 Coef. Var 6.385
## Adj. R-Squared 0.933 MSE 0.165
## Pred R-Squared 0.908 MAE 0.273
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -------------------------------------------------------------------
## Regression 67.591 6 11.265 68.16 0.0000
## Residual 3.801 23 0.165
## Total 71.393 29
## -------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) 2.692 0.445 6.046 0.000 1.771 3.613
## X3 0.184 0.032 0.476 5.698 0.000 0.117 0.251
## X4 0.109 0.026 0.499 4.244 0.000 0.056 0.162
## X6 -0.368 0.146 -0.133 -2.526 0.019 -0.669 -0.066
## X7 4.085 1.213 0.406 3.367 0.003 1.575 6.595
## X8 0.612 0.133 0.493 4.614 0.000 0.337 0.886
## X9 -0.448 0.108 -0.450 -4.135 0.000 -0.672 -0.224
## ----------------------------------------------------------------------------------------
##
##
## Elimination Summary
## -----------------------------------------------------------------------
## Variable Adj.
## Step Removed R-Square R-Square C(p) AIC RMSE
## -----------------------------------------------------------------------
## 1 X1 0.9474 0.9273 8.0021 42.8146 0.4230
## 2 X2 0.9472 0.9304 6.0604 40.9019 0.4139
## 3 X5 0.9468 0.9329 4.2345 39.1611 0.4065
## -----------------------------------------------------------------------
# Stepwise AIC Backward Regression #
ols_step_backward_aic(model_wf_full_log)
## Backward Elimination Method
## ---------------------------
##
## Candidate Terms:
##
## 1 . X1
## 2 . X2
## 3 . X3
## 4 . X4
## 5 . X5
## 6 . X6
## 7 . X7
## 8 . X8
## 9 . X9
##
##
## Variables Removed:
##
## - X1
## - X2
## - X5
##
## No more variables to be removed.
##
##
## Backward Elimination Summary
## ---------------------------------------------------------------
## Variable AIC RSS Sum Sq R-Sq Adj. R-Sq
## ---------------------------------------------------------------
## Full Model 44.811 3.757 67.635 0.94737 0.92369
## X1 42.815 3.758 67.635 0.94737 0.92731
## X2 40.902 3.769 67.624 0.94721 0.93042
## X5 39.161 3.801 67.591 0.94675 0.93286
## ---------------------------------------------------------------
Stepwise Backward Regression for X4 eliminated model
# Stepwise Backward Regression based on p values (use α=0.05) #
ols_step_backward_p(model_wf_rm4_log, penter = 0.05)
## Backward Elimination Method
## ---------------------------
##
## Candidate Terms:
##
## 1 . X1
## 2 . X2
## 3 . X3
## 4 . X5
## 5 . X6
## 6 . X7
## 7 . X8
## 8 . X9
##
## We are eliminating variables based on p value...
##
## Variables Removed:
##
## - X5
## - X2
##
## No more variables satisfy the condition of p value = 0.3
##
##
## Final Model Output
## ------------------
##
## Model Summary
## -------------------------------------------------------------
## R 0.971 RMSE 0.421
## R-Squared 0.943 Coef. Var 6.618
## Adj. R-Squared 0.928 MSE 0.178
## Pred R-Squared 0.900 MAE 0.292
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -------------------------------------------------------------------
## Regression 67.310 6 11.218 63.195 0.0000
## Residual 4.083 23 0.178
## Total 71.393 29
## -------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) 2.307 0.410 5.623 0.000 1.458 3.156
## X1 0.207 0.053 0.368 3.897 0.001 0.097 0.317
## X3 0.263 0.022 0.680 11.944 0.000 0.217 0.308
## X6 -0.532 0.144 -0.192 -3.688 0.001 -0.831 -0.234
## X7 5.453 1.002 0.542 5.442 0.000 3.380 7.525
## X8 0.613 0.137 0.495 4.462 0.000 0.329 0.897
## X9 -0.433 0.112 -0.435 -3.864 0.001 -0.665 -0.201
## ----------------------------------------------------------------------------------------
##
##
## Elimination Summary
## -----------------------------------------------------------------------
## Variable Adj.
## Step Removed R-Square R-Square C(p) AIC RMSE
## -----------------------------------------------------------------------
## 1 X5 0.9444 0.9267 7.2445 42.4657 0.4248
## 2 X2 0.9428 0.9279 5.8470 41.3046 0.4213
## -----------------------------------------------------------------------
# Stepwise AIC Backward Regression #
ols_step_backward_aic(model_wf_rm4_log)
## Backward Elimination Method
## ---------------------------
##
## Candidate Terms:
##
## 1 . X1
## 2 . X2
## 3 . X3
## 4 . X5
## 5 . X6
## 6 . X7
## 7 . X8
## 8 . X9
##
##
## Variables Removed:
##
## - X5
## - X2
##
## No more variables to be removed.
##
##
## Backward Elimination Summary
## ---------------------------------------------------------------
## Variable AIC RSS Sum Sq R-Sq Adj. R-Sq
## ---------------------------------------------------------------
## Full Model 44.118 3.925 67.468 0.94503 0.92409
## X5 42.466 3.970 67.422 0.94439 0.92669
## X2 41.305 4.083 67.310 0.94281 0.92789
## ---------------------------------------------------------------
Stepwise Backward Regression for X1 eliminated model
# Stepwise Backward Regression based on p values (use α=0.05) #
ols_step_backward_p(model_wf_rm1_log, penter = 0.05)
## Backward Elimination Method
## ---------------------------
##
## Candidate Terms:
##
## 1 . X2
## 2 . X3
## 3 . X4
## 4 . X5
## 5 . X6
## 6 . X7
## 7 . X8
## 8 . X9
##
## We are eliminating variables based on p value...
##
## Variables Removed:
##
## - X2
## - X5
##
## No more variables satisfy the condition of p value = 0.3
##
##
## Final Model Output
## ------------------
##
## Model Summary
## -------------------------------------------------------------
## R 0.973 RMSE 0.407
## R-Squared 0.947 Coef. Var 6.385
## Adj. R-Squared 0.933 MSE 0.165
## Pred R-Squared 0.908 MAE 0.273
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -------------------------------------------------------------------
## Regression 67.591 6 11.265 68.16 0.0000
## Residual 3.801 23 0.165
## Total 71.393 29
## -------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) 2.692 0.445 6.046 0.000 1.771 3.613
## X3 0.184 0.032 0.476 5.698 0.000 0.117 0.251
## X4 0.109 0.026 0.499 4.244 0.000 0.056 0.162
## X6 -0.368 0.146 -0.133 -2.526 0.019 -0.669 -0.066
## X7 4.085 1.213 0.406 3.367 0.003 1.575 6.595
## X8 0.612 0.133 0.493 4.614 0.000 0.337 0.886
## X9 -0.448 0.108 -0.450 -4.135 0.000 -0.672 -0.224
## ----------------------------------------------------------------------------------------
##
##
## Elimination Summary
## -----------------------------------------------------------------------
## Variable Adj.
## Step Removed R-Square R-Square C(p) AIC RMSE
## -----------------------------------------------------------------------
## 1 X2 0.9472 0.9304 7.0612 40.9019 0.4139
## 2 X5 0.9468 0.9329 5.2440 39.1611 0.4065
## -----------------------------------------------------------------------
# Stepwise AIC Backward Regression #
ols_step_backward_aic(model_wf_rm1_log)
## Backward Elimination Method
## ---------------------------
##
## Candidate Terms:
##
## 1 . X2
## 2 . X3
## 3 . X4
## 4 . X5
## 5 . X6
## 6 . X7
## 7 . X8
## 8 . X9
##
##
## Variables Removed:
##
## - X2
## - X5
##
## No more variables to be removed.
##
##
## Backward Elimination Summary
## ---------------------------------------------------------------
## Variable AIC RSS Sum Sq R-Sq Adj. R-Sq
## ---------------------------------------------------------------
## Full Model 42.815 3.758 67.635 0.94737 0.92731
## X2 40.902 3.769 67.624 0.94721 0.93042
## X5 39.161 3.801 67.591 0.94675 0.93286
## ---------------------------------------------------------------
## Best Subsets Regression
## -----------------------------------------
## Model Index Predictors
## -----------------------------------------
## 1 X4
## 2 X3 X4
## 3 X3 X4 X7
## 4 X1 X4 X8 X9
## 5 X3 X4 X7 X8 X9
## 6 X3 X4 X6 X7 X8 X9
## 7 X3 X4 X5 X6 X7 X8 X9
## 8 X2 X3 X4 X5 X6 X7 X8 X9
## 9 X1 X2 X3 X4 X5 X6 X7 X8 X9
## -----------------------------------------
##
## Subsets Regression Summary
## ------------------------------------------------------------------------------------------------------------------------------
## Adj. Pred
## Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP FPE HSP APC
## ------------------------------------------------------------------------------------------------------------------------------
## 1 0.8030 0.7960 0.7717 48.8552 68.4060 -19.8453 72.6096 0.5382 0.5357 0.0186 0.2251
## 2 0.8731 0.8637 0.8435 24.2129 57.2082 -30.4801 62.8130 0.3733 0.3690 0.0129 0.1551
## 3 0.8904 0.8777 0.8539 19.6668 54.8305 -32.7027 61.8365 0.3484 0.3412 0.0120 0.1434
## 4 0.9209 0.9082 0.8862 10.0581 47.0333 -38.1223 55.4405 0.2723 0.2635 0.0094 0.1107
## 5 0.9320 0.9178 0.8917 7.8458 44.5038 -38.7554 54.3122 0.2545 0.2428 0.0088 0.1020
## 6 0.9468 0.9329 0.9084 4.2345 39.1611 -39.6845 50.3706 0.2174 0.2038 0.0075 0.0857
## 7 0.9472 0.9304 0.9021 6.0604 40.9019 -36.7978 53.5126 0.2360 0.2170 0.0082 0.0912
## 8 0.9474 0.9273 0.8957 8.0021 42.8146 -33.8243 56.8265 0.2589 0.2326 0.0089 0.0977
## 9 0.9474 0.9237 0.886 10.0000 44.8113 -30.8250 60.2245 0.2861 0.2505 0.0099 0.1053
## ------------------------------------------------------------------------------------------------------------------------------
## AIC: Akaike Information Criteria
## SBIC: Sawa's Bayesian Information Criteria
## SBC: Schwarz Bayesian Criteria
## MSEP: Estimated error of prediction, assuming multivariate normality
## FPE: Final Prediction Error
## HSP: Hocking's Sp
## APC: Amemiya Prediction Criteria
## Best Subsets Regression
## --------------------------------------
## Model Index Predictors
## --------------------------------------
## 1 X1
## 2 X3 X7
## 3 X1 X3 X7
## 4 X1 X3 X6 X7
## 5 X1 X3 X7 X8 X9
## 6 X1 X3 X6 X7 X8 X9
## 7 X1 X2 X3 X6 X7 X8 X9
## 8 X1 X2 X3 X5 X6 X7 X8 X9
## --------------------------------------
##
## Subsets Regression Summary
## -------------------------------------------------------------------------------------------------------------------------------
## Adj. Pred
## Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP FPE HSP APC
## -------------------------------------------------------------------------------------------------------------------------------
## 1 0.5266 0.5097 0.4658 154.8516 94.7131 4.8488 98.9167 1.2935 1.2876 0.0447 0.5411
## 2 0.8317 0.8192 0.7983 40.2985 65.6888 -23.2171 71.2936 0.4953 0.4896 0.0171 0.2057
## 3 0.8718 0.8570 0.8308 26.9889 59.5306 -29.0072 66.5365 0.4075 0.3991 0.0141 0.1677
## 4 0.8932 0.8761 0.8454 20.8073 56.0486 -31.8764 64.4557 0.3678 0.3559 0.0127 0.1496
## 5 0.9090 0.8900 0.8567 16.7678 53.2435 -33.5760 63.0518 0.3406 0.3249 0.0118 0.1365
## 6 0.9428 0.9279 0.9003 5.8470 41.3046 -38.8856 52.5142 0.2335 0.2189 0.0081 0.0920
## 7 0.9444 0.9267 0.8975 7.2445 42.4657 -36.4162 55.0764 0.2486 0.2286 0.0086 0.0961
## 8 0.9450 0.9241 0.8902 9.0000 44.1184 -33.6709 58.1304 0.2704 0.2430 0.0093 0.1021
## -------------------------------------------------------------------------------------------------------------------------------
## AIC: Akaike Information Criteria
## SBIC: Sawa's Bayesian Information Criteria
## SBC: Schwarz Bayesian Criteria
## MSEP: Estimated error of prediction, assuming multivariate normality
## FPE: Final Prediction Error
## HSP: Hocking's Sp
## APC: Amemiya Prediction Criteria
## Best Subsets Regression
## --------------------------------------
## Model Index Predictors
## --------------------------------------
## 1 X4
## 2 X3 X4
## 3 X3 X4 X7
## 4 X3 X4 X8 X9
## 5 X3 X4 X7 X8 X9
## 6 X3 X4 X6 X7 X8 X9
## 7 X3 X4 X5 X6 X7 X8 X9
## 8 X2 X3 X4 X5 X6 X7 X8 X9
## --------------------------------------
##
## Subsets Regression Summary
## ------------------------------------------------------------------------------------------------------------------------------
## Adj. Pred
## Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP FPE HSP APC
## ------------------------------------------------------------------------------------------------------------------------------
## 1 0.8030 0.7960 0.7717 52.5895 68.4060 -19.9679 72.6096 0.5382 0.5357 0.0186 0.2251
## 2 0.8731 0.8637 0.8435 26.6181 57.2082 -30.7039 62.8130 0.3733 0.3690 0.0129 0.1551
## 3 0.8904 0.8777 0.8539 21.7454 54.8305 -33.0170 61.8365 0.3484 0.3412 0.0120 0.1434
## 4 0.9156 0.9020 0.8789 13.6900 48.9949 -37.2607 57.4021 0.2907 0.2813 0.0100 0.1182
## 5 0.9320 0.9178 0.8917 9.1352 44.5038 -39.3879 54.3122 0.2545 0.2428 0.0088 0.1020
## 6 0.9468 0.9329 0.9084 5.2440 39.1611 -40.5447 50.3706 0.2174 0.2038 0.0075 0.0857
## 7 0.9472 0.9304 0.9021 7.0612 40.9019 -37.8040 53.5126 0.2360 0.2170 0.0082 0.0912
## 8 0.9474 0.9273 0.8957 9.0000 42.8146 -34.9748 56.8265 0.2589 0.2326 0.0089 0.0977
## ------------------------------------------------------------------------------------------------------------------------------
## AIC: Akaike Information Criteria
## SBIC: Sawa's Bayesian Information Criteria
## SBC: Schwarz Bayesian Criteria
## MSEP: Estimated error of prediction, assuming multivariate normality
## FPE: Final Prediction Error
## HSP: Hocking's Sp
## APC: Amemiya Prediction Criteria
## # A tibble: 511 x 6
## Index N Predictors `R-Square` `Adj. R-Square` `Mallow's Cp`
## * <int> <int> <chr> <dbl> <dbl> <dbl>
## 1 1 1 X4 0.803 0.796 48.9
## 2 2 1 X1 0.527 0.510 154.
## 3 3 1 X5 0.523 0.506 155.
## 4 4 1 X2 0.433 0.413 189.
## 5 5 1 X7 0.350 0.327 221.
## 6 6 1 X3 0.227 0.199 268.
## 7 7 1 X8 0.0407 0.00648 339.
## 8 8 1 X9 0.0176 -0.0175 347.
## 9 9 1 X6 0.00292 -0.0327 353.
## 10 10 2 X3 X4 0.873 0.864 24.2
## # ... with 501 more rows
## # A tibble: 255 x 6
## Index N Predictors `R-Square` `Adj. R-Square` `Mallow's Cp`
## * <int> <int> <chr> <dbl> <dbl> <dbl>
## 1 1 1 X1 0.527 0.510 155.
## 2 2 1 X5 0.523 0.506 156.
## 3 3 1 X2 0.433 0.413 190.
## 4 4 1 X7 0.350 0.327 222.
## 5 5 1 X3 0.227 0.199 269.
## 6 6 1 X8 0.0407 0.00648 340.
## 7 7 1 X9 0.0176 -0.0175 349.
## 8 8 1 X6 0.00292 -0.0327 355.
## 9 9 2 X3 X7 0.832 0.819 40.3
## 10 10 2 X1 X3 0.812 0.798 47.8
## # ... with 245 more rows
## # A tibble: 255 x 6
## Index N Predictors `R-Square` `Adj. R-Square` `Mallow's Cp`
## * <int> <int> <chr> <dbl> <dbl> <dbl>
## 1 1 1 X4 0.803 0.796 52.6
## 2 2 1 X5 0.523 0.506 164.
## 3 3 1 X2 0.433 0.413 200.
## 4 4 1 X7 0.350 0.327 233.
## 5 5 1 X3 0.227 0.199 283.
## 6 6 1 X8 0.0407 0.00648 357.
## 7 7 1 X9 0.0176 -0.0175 366.
## 8 8 1 X6 0.00292 -0.0327 372.
## 9 9 2 X3 X4 0.873 0.864 26.6
## 10 10 2 X3 X7 0.832 0.819 43.2
## # ... with 245 more rows
## Stepwise Selection Method
## ---------------------------
##
## Candidate Terms:
##
## 1. X1
## 2. X2
## 3. X3
## 4. X4
## 5. X5
## 6. X6
## 7. X7
## 8. X8
## 9. X9
##
## We are selecting variables based on p value...
##
## Variables Entered/Removed:
##
## - X4 added
## - X3 added
## - X7 added
##
## No more variables to be added/removed.
##
##
## Final Model Output
## ------------------
##
## Model Summary
## -------------------------------------------------------------
## R 0.944 RMSE 0.549
## R-Squared 0.890 Coef. Var 8.618
## Adj. R-Squared 0.878 MSE 0.301
## Pred R-Squared 0.854 MAE 0.414
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -------------------------------------------------------------------
## Regression 63.565 3 21.188 70.378 0.0000
## Residual 7.828 26 0.301
## Total 71.393 29
## -------------------------------------------------------------------
##
## Parameter Estimates
## -------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## -------------------------------------------------------------------------------------
## (Intercept) 2.872 0.547 5.254 0.000 1.748 3.995
## X4 0.122 0.033 0.559 3.730 0.001 0.055 0.189
## X3 0.168 0.040 0.435 4.165 0.000 0.085 0.251
## X7 3.106 1.537 0.309 2.021 0.054 -0.053 6.266
## -------------------------------------------------------------------------------------
##
## Stepwise Selection Summary
## ------------------------------------------------------------------------------------
## Added/ Adj.
## Step Variable Removed R-Square R-Square C(p) AIC RMSE
## ------------------------------------------------------------------------------------
## 1 X4 addition 0.803 0.796 48.8550 68.4060 0.7087
## 2 X3 addition 0.873 0.864 24.2130 57.2082 0.5792
## 3 X7 addition 0.890 0.878 19.6670 54.8305 0.5487
## ------------------------------------------------------------------------------------
## Stepwise Selection Method
## -------------------------
##
## Candidate Terms:
##
## 1 . X1
## 2 . X2
## 3 . X3
## 4 . X4
## 5 . X5
## 6 . X6
## 7 . X7
## 8 . X8
## 9 . X9
##
##
## Variables Entered/Removed:
##
## - X4 added
## - X3 added
## - X7 added
## - X8 added
## - X9 added
## - X6 added
##
## No more variables to be added or removed.
##
##
## Stepwise Summary
## --------------------------------------------------------------------------
## Variable Method AIC RSS Sum Sq R-Sq Adj. R-Sq
## --------------------------------------------------------------------------
## X4 addition 68.406 14.063 57.330 0.80302 0.79599
## X3 addition 57.208 9.057 62.335 0.87313 0.86373
## X7 addition 54.830 7.828 63.565 0.89036 0.87771
## X8 addition 54.522 7.248 64.144 0.89848 0.88223
## X9 addition 44.504 4.856 66.537 0.93199 0.91782
## X6 addition 39.161 3.801 67.591 0.94675 0.93286
## --------------------------------------------------------------------------
## Stepwise Selection Method
## ---------------------------
##
## Candidate Terms:
##
## 1. X1
## 2. X2
## 3. X3
## 4. X5
## 5. X6
## 6. X7
## 7. X8
## 8. X9
##
## We are selecting variables based on p value...
##
## Variables Entered/Removed:
##
## - X1 added
## - X3 added
## - X7 added
## - X6 added
## - X8 added
## - X9 added
##
## No more variables to be added/removed.
##
##
## Final Model Output
## ------------------
##
## Model Summary
## -------------------------------------------------------------
## R 0.971 RMSE 0.421
## R-Squared 0.943 Coef. Var 6.618
## Adj. R-Squared 0.928 MSE 0.178
## Pred R-Squared 0.900 MAE 0.292
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -------------------------------------------------------------------
## Regression 67.310 6 11.218 63.195 0.0000
## Residual 4.083 23 0.178
## Total 71.393 29
## -------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) 2.307 0.410 5.623 0.000 1.458 3.156
## X1 0.207 0.053 0.368 3.897 0.001 0.097 0.317
## X3 0.263 0.022 0.680 11.944 0.000 0.217 0.308
## X7 5.453 1.002 0.542 5.442 0.000 3.380 7.525
## X6 -0.532 0.144 -0.192 -3.688 0.001 -0.831 -0.234
## X8 0.613 0.137 0.495 4.462 0.000 0.329 0.897
## X9 -0.433 0.112 -0.435 -3.864 0.001 -0.665 -0.201
## ----------------------------------------------------------------------------------------
##
## Stepwise Selection Summary
## -------------------------------------------------------------------------------------
## Added/ Adj.
## Step Variable Removed R-Square R-Square C(p) AIC RMSE
## -------------------------------------------------------------------------------------
## 1 X1 addition 0.527 0.510 154.8520 94.7131 1.0987
## 2 X3 addition 0.812 0.798 47.7990 68.9988 0.7050
## 3 X7 addition 0.872 0.857 26.9890 59.5306 0.5934
## 4 X6 addition 0.893 0.876 20.8070 56.0486 0.5523
## 5 X8 addition 0.906 0.886 18.0270 54.3108 0.5297
## 6 X9 addition 0.943 0.928 5.8470 41.3046 0.4213
## -------------------------------------------------------------------------------------
## Stepwise Selection Method
## -------------------------
##
## Candidate Terms:
##
## 1 . X1
## 2 . X2
## 3 . X3
## 4 . X5
## 5 . X6
## 6 . X7
## 7 . X8
## 8 . X9
##
##
## Variables Entered/Removed:
##
## - X1 added
## - X3 added
## - X7 added
## - X6 added
## - X8 added
## - X9 added
##
## No more variables to be added or removed.
##
##
## Stepwise Summary
## --------------------------------------------------------------------------
## Variable Method AIC RSS Sum Sq R-Sq Adj. R-Sq
## --------------------------------------------------------------------------
## X1 addition 94.713 33.799 37.594 0.52658 0.50967
## X3 addition 68.999 13.418 57.974 0.81205 0.79813
## X7 addition 59.531 9.155 62.237 0.87176 0.85696
## X6 addition 56.049 7.626 63.766 0.89318 0.87609
## X8 addition 54.311 6.733 64.660 0.90569 0.88604
## X9 addition 41.305 4.083 67.310 0.94281 0.92789
## --------------------------------------------------------------------------
## Stepwise Selection Method
## ---------------------------
##
## Candidate Terms:
##
## 1. X2
## 2. X3
## 3. X4
## 4. X5
## 5. X6
## 6. X7
## 7. X8
## 8. X9
##
## We are selecting variables based on p value...
##
## Variables Entered/Removed:
##
## - X4 added
## - X3 added
## - X7 added
##
## No more variables to be added/removed.
##
##
## Final Model Output
## ------------------
##
## Model Summary
## -------------------------------------------------------------
## R 0.944 RMSE 0.549
## R-Squared 0.890 Coef. Var 8.618
## Adj. R-Squared 0.878 MSE 0.301
## Pred R-Squared 0.854 MAE 0.414
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -------------------------------------------------------------------
## Regression 63.565 3 21.188 70.378 0.0000
## Residual 7.828 26 0.301
## Total 71.393 29
## -------------------------------------------------------------------
##
## Parameter Estimates
## -------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## -------------------------------------------------------------------------------------
## (Intercept) 2.872 0.547 5.254 0.000 1.748 3.995
## X4 0.122 0.033 0.559 3.730 0.001 0.055 0.189
## X3 0.168 0.040 0.435 4.165 0.000 0.085 0.251
## X7 3.106 1.537 0.309 2.021 0.054 -0.053 6.266
## -------------------------------------------------------------------------------------
##
## Stepwise Selection Summary
## ------------------------------------------------------------------------------------
## Added/ Adj.
## Step Variable Removed R-Square R-Square C(p) AIC RMSE
## ------------------------------------------------------------------------------------
## 1 X4 addition 0.803 0.796 52.5890 68.4060 0.7087
## 2 X3 addition 0.873 0.864 26.6180 57.2082 0.5792
## 3 X7 addition 0.890 0.878 21.7450 54.8305 0.5487
## ------------------------------------------------------------------------------------
## Stepwise Selection Method
## -------------------------
##
## Candidate Terms:
##
## 1 . X2
## 2 . X3
## 3 . X4
## 4 . X5
## 5 . X6
## 6 . X7
## 7 . X8
## 8 . X9
##
##
## Variables Entered/Removed:
##
## - X4 added
## - X3 added
## - X7 added
## - X8 added
## - X9 added
## - X6 added
##
## No more variables to be added or removed.
##
##
## Stepwise Summary
## --------------------------------------------------------------------------
## Variable Method AIC RSS Sum Sq R-Sq Adj. R-Sq
## --------------------------------------------------------------------------
## X4 addition 68.406 14.063 57.330 0.80302 0.79599
## X3 addition 57.208 9.057 62.335 0.87313 0.86373
## X7 addition 54.830 7.828 63.565 0.89036 0.87771
## X8 addition 54.522 7.248 64.144 0.89848 0.88223
## X9 addition 44.504 4.856 66.537 0.93199 0.91782
## X6 addition 39.161 3.801 67.591 0.94675 0.93286
## --------------------------------------------------------------------------
# build model 437896
model_wf_437896_log <- lm(log(y) ~ X4 + X3 + X7 + X8 + X9 + X6, data=table_wf)
ols_regress(model_wf_437896_log)
## Model Summary
## -------------------------------------------------------------
## R 0.973 RMSE 0.407
## R-Squared 0.947 Coef. Var 6.385
## Adj. R-Squared 0.933 MSE 0.165
## Pred R-Squared 0.908 MAE 0.273
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -------------------------------------------------------------------
## Regression 67.591 6 11.265 68.16 0.0000
## Residual 3.801 23 0.165
## Total 71.393 29
## -------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) 2.692 0.445 6.046 0.000 1.771 3.613
## X4 0.109 0.026 0.499 4.244 0.000 0.056 0.162
## X3 0.184 0.032 0.476 5.698 0.000 0.117 0.251
## X7 4.085 1.213 0.406 3.367 0.003 1.575 6.595
## X8 0.612 0.133 0.493 4.614 0.000 0.337 0.886
## X9 -0.448 0.108 -0.450 -4.135 0.000 -0.672 -0.224
## X6 -0.368 0.146 -0.133 -2.526 0.019 -0.669 -0.066
## ----------------------------------------------------------------------------------------
# Collinearity Diagnostics #
ols_coll_diag(model_wf_437896_log)
## Tolerance and Variance Inflation Factor
## ---------------------------------------
## # A tibble: 6 x 3
## Variables Tolerance VIF
## <chr> <dbl> <dbl>
## 1 X4 0.167 5.97
## 2 X3 0.332 3.01
## 3 X7 0.159 6.28
## 4 X8 0.202 4.94
## 5 X9 0.195 5.12
## 6 X6 0.839 1.19
##
##
## Eigenvalue and Condition Index
## ------------------------------
## Eigenvalue Condition Index intercept X4 X3 X7 X8 X9 X6
## 1 6.06603799 1.000000 0.0007146972 0.0012879224 0.001577168 6.255392e-04 0.0008014993 0.0009066212 0.00304547
## 2 0.33834763 4.234196 0.0025641623 0.1040613624 0.005790171 1.173277e-02 0.0084021537 0.0065708241 0.02244379
## 3 0.31443225 4.392270 0.0007827736 0.0005853923 0.117541638 3.247682e-03 0.0152300675 0.0281423928 0.02635038
## 4 0.18092020 5.790406 0.0043202653 0.0171208486 0.087238632 1.883680e-02 0.0099587626 0.0185479834 0.30810079
## 5 0.07065103 9.266022 0.1767257312 0.0717821748 0.003008880 4.520519e-02 0.0114424989 0.0089921454 0.54826833
## 6 0.01847255 18.121293 0.0001449205 0.0053427347 0.008960972 7.477679e-06 0.9456383423 0.9366327464 0.03064530
## 7 0.01113836 23.336833 0.8147474499 0.7998195649 0.775882539 9.203445e-01 0.0085266757 0.0002072867 0.06114594
#Model Fit Assessment
ols_plot_diagnostics(model_wf_437896_log)
# Part & Partial Correlations
ols_test_correlation(model_wf_437896_log) # Correlation between observed residuals and expected residuals under normality.
## [1] 0.9837263
# Residual Normality Test
ols_test_normality(model_wf_437896_log) # Test for detecting violation of normality assumption. #If p-value is bigger, then no problem of non-normality #
## -----------------------------------------------
## Test Statistic pvalue
## -----------------------------------------------
## Shapiro-Wilk 0.9728 0.6175
## Kolmogorov-Smirnov 0.0997 0.8982
## Cramer-von Mises 4.8429 0.0000
## Anderson-Darling 0.2996 0.5612
## -----------------------------------------------
# Variable Contributions
ols_plot_added_variable(model_wf_437896_log)
# Residual Plus Component Plot
ols_plot_comp_plus_resid(model_wf_437896_log)
# build model 437
model_wf_437_log <- lm(log(y) ~ X4 + X3 + X7, data=table_wf)
ols_regress(model_wf_437_log)
## Model Summary
## -------------------------------------------------------------
## R 0.944 RMSE 0.549
## R-Squared 0.890 Coef. Var 8.618
## Adj. R-Squared 0.878 MSE 0.301
## Pred R-Squared 0.854 MAE 0.414
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -------------------------------------------------------------------
## Regression 63.565 3 21.188 70.378 0.0000
## Residual 7.828 26 0.301
## Total 71.393 29
## -------------------------------------------------------------------
##
## Parameter Estimates
## -------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## -------------------------------------------------------------------------------------
## (Intercept) 2.872 0.547 5.254 0.000 1.748 3.995
## X4 0.122 0.033 0.559 3.730 0.001 0.055 0.189
## X3 0.168 0.040 0.435 4.165 0.000 0.085 0.251
## X7 3.106 1.537 0.309 2.021 0.054 -0.053 6.266
## -------------------------------------------------------------------------------------
# Collinearity Diagnostics #
ols_coll_diag(model_wf_437_log)
## Tolerance and Variance Inflation Factor
## ---------------------------------------
## # A tibble: 3 x 3
## Variables Tolerance VIF
## <chr> <dbl> <dbl>
## 1 X4 0.188 5.32
## 2 X3 0.386 2.59
## 3 X7 0.181 5.53
##
##
## Eigenvalue and Condition Index
## ------------------------------
## Eigenvalue Condition Index intercept X4 X3 X7
## 1 3.51967088 1.000000 0.002498604 0.004729201 0.005661232 0.002143973
## 2 0.30192504 3.414298 0.006749341 0.054452269 0.142053567 0.018736742
## 3 0.16605489 4.603893 0.068513979 0.150366864 0.078164912 0.024726360
## 4 0.01234919 16.882304 0.922238076 0.790451666 0.774120289 0.954392926
#Model Fit Assessment
ols_plot_diagnostics(model_wf_437_log)
# Part & Partial Correlations
ols_test_correlation(model_wf_437_log) # Correlation between observed residuals and expected residuals under normality.
## [1] 0.9856766
# Residual Normality Test
ols_test_normality(model_wf_437_log) # Test for detecting violation of normality assumption. #If p-value is bigger, then no problem of non-normality #
## -----------------------------------------------
## Test Statistic pvalue
## -----------------------------------------------
## Shapiro-Wilk 0.9765 0.7267
## Kolmogorov-Smirnov 0.1033 0.8736
## Cramer-von Mises 3.1908 0.0000
## Anderson-Darling 0.3511 0.4469
## -----------------------------------------------
# Variable Contributions
ols_plot_added_variable(model_wf_437_log)
# Residual Plus Component Plot
ols_plot_comp_plus_resid(model_wf_437_log)
# build model 137689
model_wf_137689_log <- lm(log(y) ~ X1 + X3 + X7 + X6 + X8 + X9, data=table_wf)
ols_regress(model_wf_137689_log)
## Model Summary
## -------------------------------------------------------------
## R 0.971 RMSE 0.421
## R-Squared 0.943 Coef. Var 6.618
## Adj. R-Squared 0.928 MSE 0.178
## Pred R-Squared 0.900 MAE 0.292
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -------------------------------------------------------------------
## Regression 67.310 6 11.218 63.195 0.0000
## Residual 4.083 23 0.178
## Total 71.393 29
## -------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) 2.307 0.410 5.623 0.000 1.458 3.156
## X1 0.207 0.053 0.368 3.897 0.001 0.097 0.317
## X3 0.263 0.022 0.680 11.944 0.000 0.217 0.308
## X7 5.453 1.002 0.542 5.442 0.000 3.380 7.525
## X6 -0.532 0.144 -0.192 -3.688 0.001 -0.831 -0.234
## X8 0.613 0.137 0.495 4.462 0.000 0.329 0.897
## X9 -0.433 0.112 -0.435 -3.864 0.001 -0.665 -0.201
## ----------------------------------------------------------------------------------------
# Collinearity Diagnostics #
ols_coll_diag(model_wf_137689_log)
## Tolerance and Variance Inflation Factor
## ---------------------------------------
## # A tibble: 6 x 3
## Variables Tolerance VIF
## <chr> <dbl> <dbl>
## 1 X1 0.279 3.58
## 2 X3 0.768 1.30
## 3 X7 0.251 3.99
## 4 X6 0.917 1.09
## 5 X8 0.202 4.94
## 6 X9 0.196 5.10
##
##
## Eigenvalue and Condition Index
## ------------------------------
## Eigenvalue Condition Index intercept X1 X3 X7 X6 X8 X9
## 1 5.87754632 1.000000 0.0009583976 0.002753729 0.003752062 0.0010622321 0.003554176 0.0008552526 0.0009724289
## 2 0.55370282 3.258064 0.0022409295 0.184392777 0.048522556 0.0065506985 0.008922660 0.0011708384 0.0004973735
## 3 0.29276462 4.480626 0.0006141795 0.021479187 0.183073381 0.0001146981 0.036353961 0.0263007081 0.0424526993
## 4 0.15641976 6.129884 0.0050342867 0.054272760 0.378429650 0.0153101550 0.417215036 0.0036091737 0.0092883543
## 5 0.08151723 8.491283 0.1288772252 0.136506302 0.004864520 0.1464814595 0.482676303 0.0146393598 0.0085093969
## 6 0.02005845 17.117856 0.6268955520 0.471310996 0.317220373 0.5780521912 0.030882409 0.1844189407 0.2640851038
## 7 0.01799079 18.074773 0.2353794295 0.129284248 0.064137458 0.2524285656 0.020395454 0.7690057267 0.6741946434
#Model Fit Assessment
ols_plot_diagnostics(model_wf_137689_log)
# Part & Partial Correlations
ols_test_correlation(model_wf_137689_log) # Correlation between observed residuals and expected residuals under normality.
## [1] 0.988106
# Residual Normality Test
ols_test_normality(model_wf_137689_log) # Test for detecting violation of normality assumption. #If p-value is bigger, then no problem of non-normality #
## -----------------------------------------------
## Test Statistic pvalue
## -----------------------------------------------
## Shapiro-Wilk 0.9769 0.7382
## Kolmogorov-Smirnov 0.0771 0.9881
## Cramer-von Mises 4.4689 0.0000
## Anderson-Darling 0.1644 0.9350
## -----------------------------------------------
# Variable Contributions
ols_plot_added_variable(model_wf_137689_log)
# Residual Plus Component Plot
ols_plot_comp_plus_resid(model_wf_137689_log)
#Lack of Fit F Test
ols_pure_error_anova(lm(y~X8, data = table_wf))
## Lack of Fit F Test
## ---------------
## Response : y
## Predictor: X8
##
## Analysis of Variance Table
## -------------------------------------------------------------------------
## DF Sum Sq Mean Sq F Value Pr(>F)
## -------------------------------------------------------------------------
## X8 1 4616882.92 4616882.92 5.795558 0.02290414
## Residual 28 36951252.44 1319687.59
## Lack of fit 21 31374881.28 1494041.97 1.875466 0.2003839
## Pure Error 7 5576371.17 796624.45
## -------------------------------------------------------------------------
ols_press(model_wf_437896_log)
## [1] 6.538275
ols_press(model_wf_437_log)
## [1] 10.43262
ols_press(model_wf_137689_log)
## [1] 7.114336
# prediction power of 1
1-((ols_press(model_wf_437896_log))/(var(table_wf$y)*(nrow(table_wf)-1)))
## [1] 0.9999998
library(texreg)
# Pretty print regression results on screen
lm(mpg ~ wt, data=my_df) %>% screenreg
texreg::screenreg(l=list(model_2_7_8))
# visualizng
library(GGally)
ggpairs(data=table_b1[c(1,3,8,9)])
# Correlation
cor(table_b1)
# Half correlation matrix:
library(corrr)
mtcars %>% correlate() %>% shave() %>% fashion()
# Visulize correlation matrix:
mtcars %>% correlate() %>% shave() %>% rplot()
# Scatterplot Matrix
mtcars[1:6] %>% plot
# Better looking version
library(ggfortify)
model_2_7_8 %>% autoplot()
# Confidence interval of coefficients
lm(mpg ~ wt + cyl, data=mtcars) %>% confint()
# Hypothesis testing of nested models
lm_mpg_wt <- lm(mpg ~ wt, data=mtcars)
lm_mpg_wt.cyl <- lm(mpg ~ wt + cyl, data=mtcars)
anova(lm_mpg_wt, lm_mpg_wt.cyl)
# convert mpg to kilometers per liter
mtcars %>% mutate(kmpl = mpg * 0.425144) %>% select(mpg, kmpl) %>% filter(mpg > 20)
nrow()
mtcars %>% group_by(am) %>%
summarize(n=n(),
mean_mpg=mean(mpg),
sd_mpg=sd(mpg),
min_mpg=min(mpg),
max_mpg=max(mpg)
mtcars %>% arrange(desc(mpg))
# mean & sd
mtcars %>% summarize(am_mean=mean(am), am_sd=sd(am))
# Frequencies by categories
mtcars %>% group_by(am) %>% tally
# Assume we want to combine LA + SD to Southern CA and Bay Area and Sacramento
## as Northern CA
(californiatod <- californiatod %>%
mutate(transit_level=case_when(
transit>0.4~"high",
transit>0.2~"medium",
TRUE ~ "low")))
## General linear F test
fit_R <- lm(mpg ~ wt, data=mtcars)
fit_F <- lm(mpg ~ wt + cyl, data=mtcars)
anova(fit_R, fit_F)
SSE_R <- resid(fit_R)^2 %>% sum
SSE_F <- resid(fit_F)^2 %>% sum
df_R <- df.residual(fit_R)
df_F <- df.residual(fit_F)
F_val <- ((SSE_R - SSE_F)/(df_R - df_F))/(SSE_F/df_F)
# Look up the critical F value for alpha=0.05
alpha <- 0.05
qf(alpha, (df_R - df_F), df_F, lower.tail=F)
# Alternatively, find the p-value corresponding to our F_val
pf(F_val, (df_R - df_F), df_F, lower.tail=F)
n <- nrow(mtcars) # number of observations
k <- length(coef(fit_R)) # number of coefficients
## Calculate R2 and adjusted R2 manually
TSS <- sd(mtcars$mpg)^2 * (n - 1)
# OR
TSS <- var(mtcars$mpg) * (n - 1)
(R2_R <- 1 - SSE_R/TSS)
(R2_R_adj <- 1 - (SSE_R/(n - k))/(TSS/(n - 1)))
# Interaction Terms
huxreg(
lm(houseval ~ transit, data=californiatod),
lm(houseval ~ transit * railtype, data=californiatod),
lm(houseval ~ transit * region, data=californiatod),
lm(houseval ~ transit * CA, data=californiatod))
# redefine the region variables with a new reference category (4 for SD)
catod2 <- californiatod %>% mutate(region = relevel(as.factor(region), ref = 4))
lm(houseval ~ region, data=catod2) %>% summary
# Partial F test:
catod3 <- californiatod %>% mutate(region = ifelse(region =="LA" | region == "SD", "LA_SD", region))
lm(houseval ~ region, data=catod3) %>% summary
anova(lm(houseval ~ region, data=catod3), lm(houseval ~ region, data=californiatod))
# Hypothesis testing of linear combination of coefficients
car::lht(model_2_7_8, "x2 = x7")
# Partial F test:H0:β2+β2=0
car::lht(lm(hours ~ married*women, data=chores), "women + married:women = 0")
# linear combination of coefficients
# The point estimate is β2^+β3^ In this case, our linear combination involves the sum rather than the difference between two coefficients, and the formula for estimating the standard error of the sum of two coefficients is:
# $\sqrt{\hat{\sigma^2_{\hat{\beta_2}}} + \hat{\sigma^2_{\hat{\beta_3}}} + 2\hat{cov}_{\hat{\beta_2}\hat{\beta_3}}}$
fit1 <- lm(hours ~ married*women, data=chores)
beta2 <- coef(fit1)["women"]
beta3 <- coef(fit1)["married:women"]
betas_vcov <- vcov(fit1)
se <- sqrt(betas_vcov["women", "women"] + betas_vcov["married:women", "married:women"] + 2 * betas_vcov["women", "married:women"])
(t_stat <- (beta2 + beta3)/se)
## Degrees of Freedom
dof <- fit1$df.residual
## compare t_stat to critical t-value
(t_crit <- qt(0.025, df=dof, lower.tail = F))
## OR find the corresponding p-value
(p_val <- 2 * pt(t_stat, lower.tail = F, df=dof))
# Partial F test on the nonlinear term
anova(lm(houseval ~ density, data=californiatod),lm(houseval ~ density + I(density^2), data=californiatod))
#To be on the safe side, enclose your tranformation in an I() function. This is not necessary for log transformation.
library(olsrr)
# leverage (hat)
leverage <- ols_leverage(lm_sfr)
ols_rsdlev_plot(lm_sfr)
# Cook's distance
ols_cooksd_chart(lm_sfr)
# DFFITS
ols_dffits_plot(lm_sfr)
# DFBETAS
ols_dfbetas_panel(lm_sfr)
# Heteroskedasticity
ols_rvsp_plot(lm_sfr)
ols_rsd_qqplot(lm_sfr)
# hypothesis test of normality of residuals
ols_norm_test(lm_sfr
# Test of Heteroskedasticity with Breusch-Pagan Test
ols_bp_test(lm_sfr)
#Heteroskedasticity-Consistent Standard Errors
# standard variance-covariance matrix
vcov0 <- vcov(lm_sfr)
vcov(model_2_7_8)
# convert to correlation
vcov0
# Heteroskedasticity-Consistent variance covariance matrix
require(car)
vcov_hc3 <- hccm(lm_sfr, type="hc3")
# In presence of Heteroskedasticity, vcov_hc3 is larger than vcov0, to redo hypothesis tests
# with the Heteroskedasticity-Consistent variance covariance matrix
if (!require(lmtest)) install.packages("lmtest") & library(lmtest)
coeftest(lm_sfr, vcov_hc3)
# All possible subset
sfrmodel <- lm(TOTALVAL ~ BLDGSQFT + YEARBUILT + GIS_ACRES + dpioneer + dfwy + dpark + dmax + dbikehq, data = taxlot_sfr)
(sfrmodel_all_subset <- ols_all_subset(sfrmodel))
# Best Subset Regression
ols_best_subset(model_2_7_8)
# Multicollinary with VIF
ols_vif_tol(lm_sfr)
## Stepwise Forward Regression
# based on p-value
(sfrmodel_stepfwd.p <- ols_step_forward(sfrmodel))
# based on AIC
(sfrmodel_stepfwd.aic <- ols_stepaic_forward(sfrmodel))
## Stepwise Backward Regression
# based on p-value
(sfrmodel_stepbwd.p <- ols_step_backward(sfrmodel))
# based on AIC
(sfrmodel_stepbwd.aic <- ols_stepaic_backward(sfrmodel))
## Step AIC regression
# Build regression model from a set of candidate predictor variables by entering and removing predictors based on Akaike Information Criteria, in a stepwise manner until there is no variable left to enter or remove any more. The model should include all the candidate predictor variables.
(sfrmodel_stepboth.aic <- ols_stepaic_both(sfrmodel))
# Cross Validation: CV assesses how the results of a model will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.
library(modelr)
library(purrr)
(taxlot_sfr_kcv <- taxlot_sfr %>%
modelr::crossv_kfold() %>%
mutate(model=map(train, ~lm(TOTALVAL~BLDGSQFT+YEARBUILT+GIS_ACRES+dpioneer+dfwy, data=.x)),
rmse=map2_dbl(model, test, modelr::rmse),
rsquare=map2_dbl(model, test, modelr::rsquare)))
taxlot_sfr_kcv %>%
summarise_at(c("rmse", "rsquare"), funs(mean))
## DID omitted
## Discrete Outcome: Count/Poisson Regression
require(MASS)
require(huxtable)
fit_lm <- lm(carb ~ mpg + qsec, data=mtcars)
fit_glm <- glm(carb ~ mpg + qsec, data=mtcars, family="poisson")
huxreg(OLS=fit_lm, Poisson=fit_glm)
fit_lm <- lm(am ~ qsec + hp, data=mtcars)
fit_glm <- glm(am ~ qsec + hp, data=mtcars, family=binomial("logit"))
huxreg(OLS=fit_lm, logit=fit_glm)
# log Likelihood
logLik(fit_glm)
fit_glm0 <- update(fit_glm, .~1)
logLik(fit_glm0)
## 'log Lik.' -21.61487 (df=1)
# pseudo R2
1 - logLik(fit_glm)/logLik(fit_glm0)
## 'log Lik.' 0.381052 (df=3)
# Interpretation of coefficients
# odds ratio
(odds <- exp(coef(fit_glm)))
#prob
odds/(1 + odds)
huxtable::huxreg(model_2_7_8, statistics = NULL)
library(leaps) # Load the package #
model_wf_subset <- regsubsets(log(y) ~X2 + X3 +X4 + X5 + X6 + X7 + X8 + X9, data=table_wf, nbest=10 ) # nbest is the number of models from each size #
summary(model_wf_subset) # Hard to read output from this #
## plot adjusted R square for each model ##
plot(model_wf_subset, scale='adjr2')
## can use Cp, r2 or bic for scale ##
plot(model_wf_subset, scale='bic')
plot(model_wf_subset, scale='Cp')
shapiro.test(rstudent(model_wf_reduce_log)) #If p-value is bigger, then no problem of non-normality #
shapiro.test(rstudent(model_wf_reduce_log))
table_wf_resi <- table_wf %>% mutate(student_residual=rstudent(model_wf_reduce_log))
ggpairs(data=table_wf_resi[c(10,3,4,6,7,8,9,11)])
table_wf_resi <- table_wf %>% mutate(student_residual=rstudent(model_wf_reduce_log))
ggpairs(data=table_wf_resi[c(10,3,4,7,11)])
Anova(model_wf_final)
vif(model_wf_final)
confint(model_wf_final, level=0.05/1) # Bonferroni joint confidence interval #
plot(model_wf_final, pch=16, col="blue")
#Create Partial Regression plots #
avPlots(model_wf_final)
confint(model_wf_437, level=0.05/1) # Bonferroni joint confidence interval #
plot(model_wf_437, pch=16, col="blue")
#Create Partial Regression plots #
avPlots(model_wf_437)
deviation <- table_wf$y-mean(table_wf$y)
# Predit_Power=1-(PRESS.stat/SST)
1-((MPV::PRESS(model_wf_final))/(deviation%*%deviation)) # Compute SST by multiplying two vectors #
# prediction power of full
1-((MPV::PRESS(model_wf_reduce_log))/(var(table_wf$y)*(nrow(table_wf)-1)))
# prediction power of 437
1-((MPV::PRESS(model_wf_437))/(var(table_wf$y)*(nrow(table_wf)-1)))
# prediction power of backward
1-((MPV::PRESS(model_wf_final))/(var(table_wf$y)*(nrow(table_wf)-1)))